We processed the exomeSeq VCF files associated with the JHU-Biobank data to produce mutation annotation files (MAF) using Ensembl’s VEP (Variant Effect Predictor) tool. MAF data from these files are stored in (syn20546180).
So the first half of this document shows the variant information associated with Blood and NF samples. The second half of the document shows copy ratio analysis at the chromosomal level for Blood, NF, and MPNST samples.
The oncoplot below shows the types of variants found in genes of interest listed by the Pratilas lab. The Variant Classification is shown as a legend below the plot.
## [1] "Mutations in our genes of interest"
The allele frequency of the specific variants according to gnomAD can be found below:
The series of lollipopPlots below show the putative location and amino-acid information associated with the variants in the above genes of interest.
The top lollipop refers to the variant in the normal Blood sample, the bottom one refers to the one in the NF sample. The gene name and the selected transcript ID (beginning with “NM_”) is located in the right hand top corner of each plot. In case more than one transcripts are found for a gene, the longest transcript is used for the visualization (the one selected is highlighted in the right hand top corner).
A small caveat in these plots is that when a protein has two overlapping domains, the labels overlap as well. In the interest of readability, the font size was reduced a bit, but some overlaps were unavoidable. Currently exploring other visualization tools to tackle this caveat.
## Gene: NF1
## HGNC refseq.ID protein.ID aa.length
## 1: NF1 NM_001042492 NP_001035957 2839
## 2: NF1 NM_000267 NP_000258 2818
## HGNC refseq.ID protein.ID aa.length
## 1: NF1 NM_001042492 NP_001035957 2839
## 2: NF1 NM_000267 NP_000258 2818
## Gene: TP53
## HGNC refseq.ID protein.ID aa.length
## 1: TP53 NM_000546 NP_000537 393
## 2: TP53 NM_001126112 NP_001119584 393
## 3: TP53 NM_001126118 NP_001119590 354
## 4: TP53 NM_001126115 NP_001119587 261
## 5: TP53 NM_001126113 NP_001119585 346
## 6: TP53 NM_001126117 NP_001119589 214
## 7: TP53 NM_001126114 NP_001119586 341
## 8: TP53 NM_001126116 NP_001119588 209
## HGNC refseq.ID protein.ID aa.length
## 1: TP53 NM_000546 NP_000537 393
## 2: TP53 NM_001126112 NP_001119584 393
## 3: TP53 NM_001126118 NP_001119590 354
## 4: TP53 NM_001126115 NP_001119587 261
## 5: TP53 NM_001126113 NP_001119585 346
## 6: TP53 NM_001126117 NP_001119589 214
## 7: TP53 NM_001126114 NP_001119586 341
## 8: TP53 NM_001126116 NP_001119588 209
## Gene: EZH2
## HGNC refseq.ID protein.ID aa.length
## 1: EZH2 NM_001203249 NP_001190178 695
## 2: EZH2 NM_001203248 NP_001190177 737
## 3: EZH2 NM_152998 NP_694543 707
## 4: EZH2 NM_001203247 NP_001190176 746
## 5: EZH2 NM_004456 NP_004447 751
## HGNC refseq.ID protein.ID aa.length
## 1: EZH2 NM_001203249 NP_001190178 695
## 2: EZH2 NM_001203248 NP_001190177 737
## 3: EZH2 NM_152998 NP_694543 707
## 4: EZH2 NM_001203247 NP_001190176 746
## 5: EZH2 NM_004456 NP_004447 751
## Gene: EGFR
## HGNC refseq.ID protein.ID aa.length
## 1: EGFR NM_005228 NP_005219 1210
## 2: EGFR NM_201284 NP_958441 705
## 3: EGFR NM_201282 NP_958439 628
## 4: EGFR NM_201283 NP_958440 405
## HGNC refseq.ID protein.ID aa.length
## 1: EGFR NM_005228 NP_005219 1210
## 2: EGFR NM_201284 NP_958441 705
## 3: EGFR NM_201282 NP_958439 628
## 4: EGFR NM_201283 NP_958440 405
## Gene: PDGFRA
## Gene: CCND3
## HGNC refseq.ID protein.ID aa.length
## 1: CCND3 NM_001136125 NP_001129597 220
## 2: CCND3 NM_001760 NP_001751 292
## 3: CCND3 NM_001136017 NP_001129489 211
## 4: CCND3 NM_001136126 NP_001129598 96
## HGNC refseq.ID protein.ID aa.length
## 1: CCND3 NM_001136125 NP_001129597 220
## 2: CCND3 NM_001760 NP_001751 292
## 3: CCND3 NM_001136017 NP_001129489 211
## 4: CCND3 NM_001136126 NP_001129598 96
## Gene: KDR
## Gene: FLT4
## HGNC refseq.ID protein.ID aa.length
## 1: FLT4 NM_182925 NP_891555 1363
## 2: FLT4 NM_002020 NP_002011 1298
## HGNC refseq.ID protein.ID aa.length
## 1: FLT4 NM_182925 NP_891555 1363
## 2: FLT4 NM_002020 NP_002011 1298
## Gene: FGFR4
## HGNC refseq.ID protein.ID aa.length
## 1: FGFR4 NM_002011 NP_002002 802
## 2: FGFR4 NM_213647 NP_998812 802
## 3: FGFR4 NM_022963 NP_075252 762
## HGNC refseq.ID protein.ID aa.length
## 1: FGFR4 NM_002011 NP_002002 802
## 2: FGFR4 NM_213647 NP_998812 802
## 3: FGFR4 NM_022963 NP_075252 762
## Gene: AXL
## HGNC refseq.ID protein.ID aa.length
## 1: AXL NM_021913 NP_068713 894
## 2: AXL NM_001699 NP_001690 885
## HGNC refseq.ID protein.ID aa.length
## 1: AXL NM_021913 NP_068713 894
## 2: AXL NM_001699 NP_001690 885
## Gene: AURKA
## HGNC refseq.ID protein.ID aa.length
## 1: AURKA NM_003600 NP_003591 403
## 2: AURKA NM_198433 NP_940835 403
## 3: AURKA NM_198434 NP_940836 403
## 4: AURKA NM_198435 NP_940837 403
## 5: AURKA NM_198436 NP_940838 403
## 6: AURKA NM_198437 NP_940839 403
## HGNC refseq.ID protein.ID aa.length
## 1: AURKA NM_003600 NP_003591 403
## 2: AURKA NM_198433 NP_940835 403
## 3: AURKA NM_198434 NP_940836 403
## 4: AURKA NM_198435 NP_940837 403
## 5: AURKA NM_198436 NP_940838 403
## 6: AURKA NM_198437 NP_940839 403
## Gene: APC
## HGNC refseq.ID protein.ID aa.length
## 1: APC NM_001127511 NP_001120983 2825
## 2: APC NM_001127510 NP_001120982 2843
## 3: APC NM_000038 NP_000029 2843
## HGNC refseq.ID protein.ID aa.length
## 1: APC NM_001127511 NP_001120983 2825
## 2: APC NM_001127510 NP_001120982 2843
## 3: APC NM_000038 NP_000029 2843
## Gene: ATM
## Gene: SMARCA2
## HGNC refseq.ID protein.ID aa.length
## 1: SMARCA2 NM_003070 NP_003061 1590
## 2: SMARCA2 NM_139045 NP_620614 1572
## HGNC refseq.ID protein.ID aa.length
## 1: SMARCA2 NM_003070 NP_003061 1590
## 2: SMARCA2 NM_139045 NP_620614 1572
Currently our copy ratio analysis captures information at the gross chromosomal level and not gene level. From the above plot few observations emerge:
The MPNST sample data is extremely noisy. Refering back to David Mohr’s email, they did not release VCF data for this sample since it did not pass their QC. This plot echoes a similar observation where the bam file that was released (probably by mistake) captures extremely noisy data. Great caution should be taken while making any conclusions regarding this sample from the above plot.
The plots for Blood and Neurofibroma samples look fairly similar with the exception of possible changes in copy ratio in Chr 9, 10, and 11.